[section {PEG serialization format}]

Here we specify the format used by the Parser Tools to serialize
Parsing Expression Grammars as immutable values for transport,
comparison, etc.

[para]

We distinguish between [term regular] and [term canonical]
serializations.

While a PEG may have more than one regular serialization only exactly
one of them will be [term canonical].


[list_begin definitions][comment {-- serializations --}]
[def {regular serialization}]

[list_begin enumerated][comment {-- regular points --}]
[enum]
The serialization of any PEG is a nested Tcl dictionary.

[enum]
This dictionary holds a single key, [const pt::grammar::peg], and its
value. This value holds the contents of the grammar.

[enum]
The contents of the grammar are a Tcl dictionary holding the set of
nonterminal symbols and the starting expression. The relevant keys and
their values are

[list_begin definitions][comment {-- grammar keywords --}]
[def [const rules]]

The value is a Tcl dictionary whose keys are the names of the
nonterminal symbols known to the grammar.

[list_begin enumerated][comment {-- nonterminals --}]
[enum]
Each nonterminal symbol may occur only once.

[enum]
The empty string is not a legal nonterminal symbol.

[enum]
The value for each symbol is a Tcl dictionary itself. The relevant
keys and their values in this dictionary are

[list_begin definitions][comment {-- nonterminal keywords --}]
[def [const is]]

The value is the serialization of the parsing expression describing
the symbols sentennial structure, as specified in the section
[sectref {PE serialization format}].

[def [const mode]]

The value can be one of three values specifying how a parser should
handle the semantic value produced by the symbol.

[include ../modes.inc]
[list_end][comment {-- nonterminal keywords --}]
[list_end][comment {-- nonterminals --}]

[def [const start]]

The value is the serialization of the start parsing expression of the
grammar, as specified in the section [sectref {PE serialization format}].

[list_end][comment {-- grammar keywords --}]

[enum]
The terminal symbols of the grammar are specified implicitly as the
set of all terminal symbols used in the start expression and on the
RHS of the grammar rules.


[list_end][comment {-- regular points --}]

[def {canonical serialization}]

The canonical serialization of a grammar has the format as specified
in the previous item, and then additionally satisfies the constraints
below, which make it unique among all the possible serializations of
this grammar.

[list_begin enumerated][comment {-- canonical points --}]
[enum]

The keys found in all the nested Tcl dictionaries are sorted in
ascending dictionary order, as generated by Tcl's builtin command
[cmd {lsort -increasing -dict}].

[enum]

The string representation of the value is the canonical representation
of a Tcl dictionary. I.e. it does not contain superfluous whitespace.

[list_end][comment {-- canonical points --}]
[list_end][comment {-- serializations --}]

[subsection Example]

Assuming the following PEG for simple mathematical expressions

[para]
[include ../example/expr_peg.inc]
[para]

then its canonical serialization (except for whitespace) is

[para]
[include ../example/expr_serial.inc]
[para]
